4/1/2022

Getting Started

Setup

Welcome! While we’re waiting:

  • Navigate to the workshop webpage: https://github.com/dlab-berkeley/Census-Data-in-R

  • Scroll down and read the Readme section.

  • Clone or download the workshop files by clicking on the green CODE button.

    • If you download the zipfile, unzip it.

    • Make a note of the folder in which the workshop files reside.

Introduction

  • About me

  • About you

    • Your familiarity with US Census data
    • With geospatial data
    • With geospatial data in R

Outline

  • Brief overview of the primary US Census data products

  • Introduce R packages for working with census data

  • Use those packages to fetch census data

  • Use those packages to fetch census data plus census geographic boundary files

  • Make maps of census data

Census Data Overview

US Census Bureau

The “nation’s leading provider of quality data about its people and economy.”

- https://www.census.gov

Primary Census Products

  • Decennial Census

  • American Community Survey (ACS)

Decennial Census

Complete count of the population every 10 years since 1790

A snapshot of the American population in time, with an April 1 reference date.

Includes data on

  • Population: by sex, age, race/ethnicity, and family / household relationships

  • Housing: by occupancy (occupied, vacant), tenure (owned, rented), and group quarters

From 1840 to 2000, additional questions were asked of a sample of the population.

American Community Survey (ACS)

Since 2005, the American Community Survey (ACS) has replaced the decennial census sample data questions.

  • Annual survey of a sample of about 3.5 million households released for 1, 3 or 5 year period.

  • Provides period estimates of demographic, social, economic, and housing characteristics

  • Includes margin of error values for the estimates

ACS Data Products

ACS 1-year and 5-year estimates are currently available through 2020

  • New data is released at the end of the next year (e.g., 2020 data in Dec 2021)
  • But COVID is causing a delay in the release dates and the 2020 data was just released!!

ACS 3-year no longer available (2008—2013)

  • More data tables are available for the ACS 5-year estimates than for the ACS 1 year or ACS 3 year estimates.

See: Census ACS: Guidance for Data Users

ACS Period Estimates

The ACS 1 year estimates include data from a sample of the population collected over a one year period.

Five years of data are pooled together, weighted and processed as a whole dataset to create the ACS 5 year estimates.

Use the ACS 1 year estimates when you want the most current data and are less concerned about precision (larger margins of error). However, the ACS 1 year estimates are only available for areas with large populations (+65,000) and for a subset of data tables.

Use the ACS 5 year estimates when you want more stability in the estimates, more data tables, and smaller geographic tabulation units. But can be tricky to interpret the data if the five year period is not stable (e.g., covid and 2016-2022 ACS 5yr.)

Demographic* Social Economic Housing
Sex Families Income Tenure*
Age Education Benefits Occupancy*
Race Marital Status Employment Status Group quarters*
Hispanic Origin Fertility Occupation Housing Value
Relationships Grandparents Industry Taxes & Insurance
Veterans Commuting Utilities
Disability Status Place of Work Mortgage
Language at Home Health Insurance Monthly Rent
Citizenship Structure Type
*decennial census Mobility

Census Geographies

Census data is collected from individuals. The individual-level response data is called microdata.

For privacy reasons, only a very limited subset of census microdata is publicly available as the Public Use Microdata Samples (PUMS) data.

Most census data is made publicly available only when aggregated to a geographic tabulation unit.

  • Tabulation units include states, counties, census tracts, block groups, blocks, etc.

Not all census data is available for all geographic tabulation units. For example, only decennial census data are available at the block level.

Census Geographic Tabulation Units

Census Data and Census Geographies

Census Data Workflow

Identify your

  • Topic of interest, e.g., population by age, income, monthly rents, etc…
  • Dataset: Decennial Census or ACS 1-yr or ACS 5-yr?
  • Year(s): for what time period?
  • Geographic tabulation unit of aggregation (county, tract, etc.)
  • Geographic filter by state(s) or counties

Then determine what specific census variables are available for your topic.

CAUTION

“If you want to measure change you can’t change the measures!”

Census tables, variables, geographies, and geographic boundaries change over time!

Measuring change over time with census data is its own thing, complex, and not covered by this workshop!

Getting Census Data

Here are three of the primary websites from which you can directly download census data:

You can download Census geographic data directly on the census website or from NHGIS.

Census APIs

You can write code to fetch data from the Census Web APIs

  • API: application programming interface

  • Web API: URLs can be formatted to make queries that return data

Or you can leverage an existing R package to make this easier!

  • That’s what we will do!

Only a subset of recent Census data products are available via APIs.

R Packages for Working with Census Data

R Packages for Working with Census Data

tidycensus

An R package with functions that make it easier to fetch decennial census and ACS data from the Census APIs.

Only a limited set of Census data available via tidycensus

  • Decennial census: 1990, 2000, and 2010

  • ACS 1 yr: 2005 through 2019

  • ACS 5 yr: 2005—2009 through 2015—2019 are available.

Actively maintained and expanding to include more census data products (see tidycensus website)

About tidycensus

tidycensus tutorials

tidyverse

The tidyverse package is an umbrella package that installs all the core tidyverse packages and makes them easier to manage and load in R, including:

  • ggplot2, for data visualization
  • dplyr, for data manipulation
  • tidyr, for data tidying
  • readr, for data import
  • purrr, for functional programming
  • tibble, for tibbles, a modern re-imagining of data frames
  • stringr, for strings
  • forcats, for factors

Learn more about the tidyverse at https://www.tidyverse.org.

sf package

Simple features for geospatial data objects and methods.

  • The main R package for working with vector geospatial data
    • vector: locations represented as points, lines and polygons

sf is loaded and used automatically by tidycensus.

The online book Geocomputation with R is a great resource for learning about the sf package and working with geospatial data in R.

mapview

mapview provides functions for quickly and easily create interactive mapping visualizations for data exploration.

Requesting a Census API key

Hands-on Tutorial Time!

Setup

Clone or downloaded and unzip the workshop files from: https://github.com/dlab-berkeley/Census-Data-in-R

Then:

  1. Open the folder with the workshop files

  2. Double-click on the R Project file Census-Data-in-R.Rproj

  3. This should open RStudio - with the Files panel displaying the workshop folder contents.

  4. Double-click on the file Rcensus_data_maps-tutorial.Rmd to follow along!

You can also open the file Rcensus_data_maps-tutorial.html in the RStudio Files tab to follow along in a web brower.